DOCUKEKT RESUME 

ED 337 041 FL 019 755 



AUTHOR 
TITLE 

INSTITUTION 

SPONS AGENCY 

PUB DATE 
CONTRACT 
NOTE 

PUB TYPE 



Navarrete, Cecilia; And Others 
Informal Assessment in Educational Evaluation: 
Implications for Bilingual Education Programs. 
National Clearinghouse for Bilingual Edi^cation, 
Washington, DC« 

Office of Bilingual Education and Minority Languages 

Affairs (ED), Washington, DC^ 

90 

289004001; T288003002 
26p. 

Reports - Descriptive (141) 



EDRS PRICE 
DESCRIPTORS 



IDENTIFIERS 



MF01/PC02 Plus Postage. 

•Bilingual Education Programs; Comparative Analysis; 
Elementary Secondary Education; *Eva.\uation Methods; 
Formative Evaluation; ^Informal Assessment; *Language 
Tests; ^Second Language Instruction; Standardized 
Tests; *Student Evaluation 
Elementary Secondary Education Act Title VII 



ABSTRACT 

Given the controversy over the use of standardized 
tests that rely heavily on multiple-choice items reflecting the 
language, culture, and/or learning style of the middle class 
majority, arguments are advanced for the use of alternative, 
supplemental forms of assessment. Informal assessment is defined as 
techniques that can easily be incorporated into classroom routine3 
and learning activities, and are identified as unstructured (e.g., 
writing samples, homework, journals, games, debates) or structured 
(e.g., checklists, close tests, rating scales, questionnaires, 
structured interviews) . Guidelines for informal assessment are 
offered, including scoring procedures such as holistic or analytic 
procedures, general impression markings, or error patterns. 
Guidelines for using another method, student portfolios, are 
detailed. Guidance is also offered for the evaluation of programs 
funded under the Elementary and Secondary Education Act Title VII, 
including reporting assessment data. It is concluded that informal 
techniques are needed to provide the continuous, ongoing measurement 
of student growth needed for formative evaluation and for planning 
instructional strategies. Contains 23 references. (LB) 



* Reproductions supplied by EDRS are the best that can be made 

* from the original document. 




I r 

\;ili()ii;tf 

I 

( Iriiriuj^honsc 
•for ' ■ 

\.<luc;Hii»n 



InlnrnKilion 

(»ui(k' 

Srrics 



SuniiiuT 



Informal Assessment in 
Educational valuation: 
Implications for Bilingual 
Education Programs 



( "t'cilia Navarrctt' 
Judith Wilde 
Chris Nelson 
Robert Martine/ 
(Jary Harnett 



••PERMISSION TO REPRODUCE THIS 
MATERIAL HAS BEEN GRANTED BY 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC)." 



0H« 0. "•"'pcES INFORMATION 

EDUCATIONAL «»3U'^CEc. 

OM.no.ch.nge. 



, Po.n..o.v..wa. oP;n^;;/;:;%..m oM-C' 
m«n( do not necetw » 

5rfllP^....ono. POI.CV 



^ BEST COPY AVAILABLE 



The Nalit^nal ClcaringlKMisc for Bilingual ndiicatioii (NCBF.) is fiinil- 
ctl by the U.S. Department of Hducation's Office of Bilingual Hduca- 
tion and Minority Languages Affairs (OBHMLA) and is operated un- 
der Contract No. 289004001 by The George Washington University's 
Center for the Study t^f Education and Nationa' Development, jointly 
with the Center tor Applied Lingui"!^iGii.J'he contents of this publica- 
tion do not necessarily reflect the views or jiolicies of the Department 
of Fxlucation, nor does the mention of trade names, commercial prod- 
uct.s, or organizations imply endorsement by the U.S. (lovernment. 
Readers are free to duplicate and use these materials in keeping with 
accepted publication standards. requests thai proper credit be 

uiven in the event of reprotluction. 



o 

ERIC 



3 



Central to the evaluation of any educational program are the 
instruments and procedures used to assess that program's effects. 
Many programs use comn ercially available standardized tests to 
measure academic achievement or language proficiency. There are 
good reasons for doing so. Standardized tests usually are adminis- 
tered annually by school districts, providing a ready source of 
achievement data. Test publishers provide information about the 
test's validity and reliability, fulfilling another requirement of eval- 
uation. And, standardized test scores generally have been accepted 
by educators and the community. 

However, recent research on student achievement has focused 
on problems associated with overreliance on standardized tests 
(e.g., Haney & Madaus 1989; Marston & Magnusson 1987; Pikul- 
ski 1990; Shepard 1989). Alternative approaches to assessing stu- 
dent progress have been suggested that address many of the prob- 
lems associated with standardized tests (e.g., Marston & 
Magnusson 1987; Rogers 1989; Wiggins 1989; Wolf 1989). The 
purpose of this guide is to review some of the problems associated 
with standardized testing, describe alternative assessment 
approaches, and discuss how these approaches might be employed 
by bilingual educators to supplement the use of standardized tests. 

Criticisms of standardized tests seem to have grown in proportion 
to the frequency with which, and the purposes for which, they i\xe 
used (Haney & Madaus 1989). Pikulski (1990) suggests that the 
greatest misuse of standardized tests may be their overuse. Many 
districts now administer such tests at every grade level, define suc- 
cess or failure of programs in terms of test scores, and even link 
teacher and administrator salaries and job security to student perfor- 
mance on standardized test performance. Three areas often criti- 
cized in regard to standardized tests are content, item format, and 
item bias. Standardized tests are designed to provide the best match 
possible to what is perceived to be the "typical" curriculum at a 
specific grade level. Because a bilingual education program is built 
on objectives unique to the needs of its students, many of the items 
on a standardized test may not measure the objectives or content of 
that program. Thus a standardized test may have low content valid- 
ity for specific bilingual education programs. In such a situation, the 
test might not be sensitive to actual student progress. Consequently, 
the program, as measured by this test, would appear to be 
ineffective. 



Introduction' 



Concerns 
with 

Standardized 
Testing 



ERIC 



4 



1 



Defining 
Informal 

Assessment 



Standardized achievement tests generally rely heavily on multi- 
ple-choice items. This item format allows for greater content cover- 
age as well as objective and efficient scoring. However, the 
response required by the format is recognition of the correct 
answer. This type of response does not necessarily match the type 
of responses students regularly make in the classroom, e.g., the pro- 
duction or synthesis of information. If students are not u.sed to 
responding within the structure imposed by the item fomiat, their 
test peifomiance may suffer. On the other hand, students may rec- 
ognize the correct for.ii when it is presented as a discrete item in a 
test format, but fail to use that form correctly in communication 
contexts. In this case, a standardized test may make the student 
appear more proficient than performance would suggest. 

Further, some tests have been criticized for including items that 
are biased against certain kinds of students (e.g., ethnic minorities, 
limited English proficient, mral, inner-city). The basis for this criti- 
cism is that the items reflect the language, culture, and/or learning 
style of the middle-class majority (Neill & Medina, 1989). 
Although test companies have attempted to write culture-free items, 
the removal of questions from a meaningful context has proved 
problematic for minority students. 

Thus, there are strong arguments in favor of educators consider- 
ing the use of alternative fonns of assessment to supplement stan- 
dardized test information. These alternate assessments should be 
timely, not time consuming, truly representative of the curriculum, 
and tangibly meaningful to the teacher and student. Techniques of 
informal assessment have the potential to meet these criteria as well 
as programmatic requirements for formative and summative evalua- 
tions. Validity and reliability are not exclusive propenies of fomial, 
norm-referenced tests. Infonnal techniques are valid if they meas- 
ure the skills and knowledge imparted by the project; they arc relia- 
ble if they measure consistently and accurately. 

^ ^Formal" and "infomial" are not technical psychometric terms; 
therefore, there are no uniformly accepted definitions. "Informal" is 
used here to indicate techniques that can easily be incorporated into 
classroom routines and learning activities. Infomial assessment 
techniques can be u.sed at anytime without interfering with insti'uc- 
tional time. Their results are indicative of the student's perfomiance 
on the skill or subject of interest. Unlike standardized tests, they are 
not intended to provide a comparison to a broader group beyond the 
students in the local project. 



2 



5 



This is not to say that informal assessment is casual or lacking 
in rigor. Formal tests assume a single set of expectations for all stu- 
dents and come with prescribed criteria for scoring and interpreta- 
tion. Informal assessment, on the other hand, requires a clear under- 
standing of the levels of ability the students bring with them. Only 
then may assessment activities be selected that students can attempt 
reasonably. Informal assessment seeks to identify the strengths and 
needs of individual students without regard to grade or age norms. 

Methods for informal assessinent can be divided into two main 
types: unstructured (e.g., student work samples, journals) and struc- 
tured (e.g., checklists, observations). The unstnictured methods fre- 
quently are somewhat more diff'cult to score and evaluate, but they 
can provide a great deal of valuable information about the skills of 
the children, particularly in the areas of language proficiency. Struc- 
tured methods can be reliable and valid techniques when time is 
spent creating the "scoring" procedures. 

While informal assessment utilizes open-ended exercises 
reflecting student learning, teacl;ers (and students) can infer "from 
the mere presence of concepts, a5, well as correct application, that 
the student possesses the intended outcomes" (Muir & Wells 1983, 
p. 9^). Another important aspect of informal assessments is that 
they actively involve the students in the evaluation process — they 
are not just paper-and-pencil tests. 

Unstructured Assessment Techniques 

Unstructured tecluiiques for assessing students can run the 
gamut from writing stories to playing games and include both writ- 
ten and oral activities. The range of possible activities is limited 
only by the creativity of the teacher and students. Table 1 on page 4 
presents several illustrative unstructured assessments/techniques. 

Structured Assessment Techniques 

Stmctured assessments are planned by the teacher much more 
specifically than are unstructured assessments. As the examples 
listed and described in Table 2 on page 6 indicate, structured assess- 
ment measures are more varied than unstructured ones. Indeed, 
some of them are types of tests of one kind or another. In each case, 
definitely "right" and "wrong," "completed" or "not completed" 
determinations can be made. Consequently, the scoring of struc- 
tured assessment activities is relatively easier compared to the scor- 
ing of unstmctured assessment activities. 



Informal 

Assessment 

Techniques 



3 



o 

ERIC 



Table 1 



Types 
of 

unstructured 

assessment 

techniques 



Writing Samples 

When students write anything on specific topics, their products 
can be scored by using one of the techniques described in Table 
3. Other creative writing samples that can be used to assess stu- 
dent progress include newspapers, newsletters, collages, graffiti 
walls, scripts for a play, and language experience stories. 

Homework 

Any written work students do alone, either in class or in the 
home, can be gathered an J used to assess student progress. 
With teacher guidance, students can participate in diagnosing 
and remediating their own errors. In addition, students' inter- 
ests, abilities, and efforts can be monitored across time. 



Logs or journals 

An individual method of writing. Teachers can reviev/ on a 
daily, weekly, or quarterly basis to deiemiine how students are 
perceiving their learning processes as well as shaping their 
ideas and strengths for more fomial writing which occurs in 
other activities. 



Games 

Games can provide students with a challenging method for 
increasing their skills in v,'irious areas such as maih, spelling, 
naming categories of objects/people, and so on. 



Debates 

Students' oral work can be evaluated informally in debates by 
assessing their oral presentation skills in terms of their ability to 
understand concepts and present them to others in an orderly 
fashion. 



'J 



ERIC 



Brain:;torming 

This technique can be used successfully with all ages of chil- 
dren to determine what may already be known about a particular 
topic. Students often feel free to participate because there is no 
criticism or judgment. 

Story retelling 

This technique can be used in either oral or written formats. It 
provides information on a wide range of language-based abili- 
ties. Recall is part of retelling, but teachers can use it to deter- 
mine whether children understood the point of the story and 
what problems children have in organiy.mg the elements of the 
story into a coherent whole. This also can be used to share cul- 
tural heritage when children are asked to retell a story in class 
that is part of their family heritage. 

Anecdotal 

This method can be used by teachers to record behaviors and 
students' progress. These comments can include behavioral, 
emotional, and academic information. For instance, "Jaime sat 
for five minutes before beginning his assignment." These 
should be writtcii carefully, avoiding judgmental words. 

Naturalistic 

Related to anecdotal records, this type of observation may take 
the form ot notes written at the end of the day by a teacher. 
They may record what occurred on the playground, in the class- 
room, among students, or may just reflect the general classroom 
atmosphere. 



ERIC 



8 



5 



Table 2 



Types 
of 

structured 

informal 

assessments 



Checklists 

Checklists specify student behaviors or products expected dur- 
ing progression through the curriculum. The items on the check- 
list may be content area objectives. A checklist is considered to 
be a type of observational technique. Because observers check 
only the presence or absence of the behavior or product, check- 
lists generally are reliable and relatively easy to use. Used over 
time, checklists can document students' rate and degree of 
accomplishment within the curriculum. 



Cloze Tests 

Cloze tests are composed of text from which words have been 
deleted randomly. Students fill in the blanks based on their com- 
prehension of the context of the passage. The procedure is 
intended to provide a measure of reading comprehension. 

Criterion-referenced Tests 

Criterion-referenced tests are sometimes included as a type of 
informal assessment. This type of test is tied directly to instiuc- 
tional objectives, measures progress through the curriculum and 
can be used for specific instructional planning. In order for the 
test to reflect a particular curriculum, criterion-referenced tests 
often are developed locally by teachers or a school district. Stu- 
dent performance is evaluated relative to mastery of the objec- 
tives, with a minimum perfomiancc level being used to define 
mastery. 

Rating Scales 

This is an assessment technique often associated with observa- 
tion of student work or behaviors. Rather than recording the 
"presence" or "absence" of a behavior or skill, the observer sub- 
jectively rates each item according to some dimension of inter- 
est. For example, students might be rated on how proficient they 
are on different elements of an oral presentation to the class. 
Each element may be rated on a 1 to 5 scale, with 5 representing 
the highest level of proficiency. 



ERIC 



Questionnaires 

A questionnaire is a self-report assessment device on which stu- 
dents can provide information about areas of interest to the 
teacher. Questionnaire items can be written in a variety of for- 
mats and may be forced-choice (response alternatives are pro- 
vided) or open-ended (students answer questions in their own 
words). Questionnaires designed to provide alternative assess- 
ments of achievement or language proficiency may ask students 
to report how well they believe they are perfomiing in a particu- 
lar subject or to indicate areas in which they would like more 
help from the teacher. One type of questionnaire (which 
assumes that the student can read in the native language) 
requests that students check off in the first language the kinds of 
things they can do in English. For a questionnaire to provide 
accurate inforaiation, students must be able to read the items, 
have the information to respond to the items, and have the writ- 
ing skills to respond. 

Miscue Analysis 

An infomial assessment of strategies used by students when 
read ng aloud or retelling a story. Typically, students read a 
grade-level passage (e.g., 250 words) while a judge follows 
along with a duplicate copy of the passage. The student may be 
tape recorded. Each time an error occurs, the judge circles the 
word or phrase. A description of the actual error can be taken 
from the tape after the session and analyzed for errors in pro- 
nunciation, sentence structure, vocabulary, use of syntax, etc, 
(see Goodman 1973). 

Structured Interviews 

Stmctured interviews are essentially oral interview question- 
naires. Used as an alternative assessment of achievement or lan- 
guage proficiency, the interview could be conducted with a stu- 
dent or a group of students to obtain infomiation of interest to a 
teacher. As with written questionnaires, interview questions 
could be forced-choice or open-ended. Because the information 
exchange is entirely oral, it is important to keep interview ques- 
tions (including response alternatives for forced-choice items) 
as simple and to-the-point as possible. 



7 



ERLC 



10 



Guidelines for 

Informal 

Assessment 



In order to be effective, informal assessment activities must be 
carefully planned. With appropriate planning, they can be reliable 
and valid, and they can serve diagnostic purposes as well as fonna- 
tive and summative evaluation purposes within all types of bilin- 
gual education programs. General guidelines are presented here to 
ensure these qualities. These guidelines apply both to fonnal and 
informal assessments. 



Validity and Reliability 

Standardized tests often are selected because their technical 
manuals report validity and reliability characteristics. However, if 
the content of these tests does not match the instructional objectives 
of the project, their validity is negated. For example, many stan- 
dardized tests include structural analy.sis skills as part, of the reading 
or language arts sections. If a bilingual education project does not 
teach structural analysis skills, concentrating instead on the commu- 
nicative aspects of reading/writing, such a lest may not be valid for 
that particular project. 

The validity of informal measures can be established by demon- 
strating that the information obtained from a given technique 
reflects the project's instructional goals and objectives. If, for exam- 
ple, the project is teaching communicative writing, a collection of 
holistically scored writing samples would be a valid measure. 
Therefore, a first step toward validating the use of informal assess- 
ment measures is a clear statement of curricular expectations in 
terms of goals and objectives. 

Reliability, in its purest sense, refers to the ability of a measure 
to discriminate levels of competency among persons who take it. 
This is accomplished through the consistent application of scoring 
criteria. As with validity, the reliability of informal measures can be 
established by a clear statement of the expectations for student per- 
formance in the cuiriculum and ensuring that teachers apply consis- 
tent criteria based on those expectations. If the informal measures 
accurately represent students' progress, and if they accurately dis- 
tinguish the differential progress made by individual students, they 
are reliable. 

Scoring Procedures 

Consideration has to be given to the reliability and validity of 
the scoring procedures used in assessment, both fomial and infor- 
mal. Among critical issues to be addressed are: 



8 




1. The validity of the judgment may be limited by the heavy 
dependency on the opinion of raters. To ensure high reliability, rat- 
ers must be trained to meet a set criterion (e.g., when judging ten 
individuals, raters should rate eight of them similarly). 

2. The scores must be specific to the learning situation. The 
scoring procedure must match the exercise or performance. To 
ensure this match, the purpose for assessment and the content to be 
assessed must first be decided. Agreement should also be sought on 
the descriptors developed for each scoring category to be used. 

3. Scoring procedures may be time consuming. To ensure suc- 
cess, the commitment and support of project and school personnel 
must be sought. Training and practice must be offered to the raters. 

Scoring procedures utilized in unstructured assessment activities 
can be used to: 

• measure progress and achievement in most content ai'eas; 
•measure literacy skills such as oral, reading, and wriiien 

production; 

• develop sumniative and formative evaluations; 

• make an initial diagnosis of a student's leai-ning; 

• guide and focus feedback on students' work; 

• measure students' growth over time or for specific periods; 

• determine the effectiveness of an instnictional program; 

• measure gi'oup differences between project students and nonpro- 
jcct compiu-ison groups; 

• analyze the performance of an individual student; and 

•correlate student outcomes with formal, standardized tests of 
achievement and language proficiency. 

Table 3 on page 10 lists some general sconng procedures and a 
brief summary description of popularly used techniques. 



Different methods of combining types of structured and unstruc- 
tured informal assessments and associated scoring procedures 
appear in the literature. While these approaches have different 
labels and differ somewhat in philosophy, all are offered as alterna- 
tives to standardized tesung and use informal assessment to meas- 
ure student performance in the context of the cuiriculum. 

1. Curriculum-based assessment uses the "material to be learned 
as the basis for assessing the degree to which it has been learned" 



Combining 
Assessments for 
Evaluation 



9 



ERIC 



12 



Table 3 



Scoring 

assessments 

for 

unstructured 
activities 



Holistic 

A guided procedure for evaluating performance (oral or written) 
as a whole rather than by its separate linguistic, rhetorical, or 
informational features. Evaluation is achieved through the use 
of a general scoring guide which lists detailed criteria for each 
score. Holistic judgments are made on the closest match 
between the criteria and the students' work. Criteria typically 
are based on a rating scale that ranges from 3 to 10 points (3 = 
low quality level and 10 = high quality level). 



Primary Trait 

A modified version of holistic scoring; the most difficult of al! 
holistic scoring procedure.^, its primary purpose is to as.sess a 
particular feature(s) of a discourse or a performance (oral or 
v/ritten) rather than the students' work as a whole. Secondary 
level traits also can be identified and scored using this 
approach. 

Analytic 

A complex version of holistic scoring; students' work is evalu- 
ated according to multiple criteria which are weighted based on 
their level of importance in the learning situation. For example, 
a writing sample can be assessed on organization, sentence 
structure, usage, mechanics, and format. Each criterion is rated 
on a 1 to 5 scale (1 = low and 5 = high). A weighting scheme 
then is applied. 

For example, the organization of an essay can be weighted 
six times as much as the format; sentence structure five times as 
much as format; and so on. This procedure can be used foi 
many purposes such as diagnostic placement, reclassification 
and exiting, growth measurement, program evaluation, and edu- 
cational research. 



10 



ERIC 



Holistic Survey 

Uses multiple samples of students' written work representing 
three of five discourse modes: expressive, narrative, descriptive, 
expository, and argumentative. Prior to scoring, students select 
topics, repeat oral directions to demonstrate understanding of 
the task, and have the opportunity to revise and edit their work 
before submitting it for evaluation. The scoring procedures used 
in the survey can include primary trait, analytic, or other holistic 
scoring devices relevant to the goals and objectives of the writ- 
ten assignment. 

General Impression Markings 

The simplest of the holistic procedures. The raters score the 
papers by sorting papers along a continuum such as excellent to 
poor, or acceptable to unacceptable. Critical to this approach is 
that raters become "calibrated" to reach consensus by reading 
and judging a large sample of papers. 

Error Patterns 

The assessment of students' written work or mathematical com- 
putations. Scoring is based on a criterion that describes the pro- 
cess or continuum of learning procedures that reflect under- 
standing of the skill or concept being assessed. A minimum of 
three problems or written assignments are collected and 
assessed to ensure that a student's error is not due to chance. 

Assigning Grades 

The "old standard." Students are assigned a number or letter 
grade based on achievement, competency, or masteiy levels. 
Grades can be pass-fail or can reflect letter grades, such as A to 
F. The major limitation of this scoring procedure is that grades 
do not provide any information on the strengths or weaknesses 
in a content area. 



11 



I 



4 




(Tucker 1985, p. 199). This approach employs informal measures 
such as writing samples, reading samples from the basal seriej . and 
teacher-made spelling tests from the basal series. It has received a 
good al of attention in the special education literature (e.g., Deno 
1985; Maiston & Magnusson 1987) and was developed, in pan, in 
response to the need to address performance criteria specified in 
students' individualized education plans (lEPs). 

2. Ecological assessment (e.g., Bulgren & Knackendoffel 1986) 
evaluates student performance in the context of the environment. 
Sources of such data include student records, student interviews, 
observations, and collections of student products. Ecological assess- 
ment takes into account such things as the physical arrangement of 
the classroom; patterns of classroom activity; interactions between 
the teacher and students and among students; student learning 
styles; and expectations of student performance by parencs, peers, 
and teachers. 

3. Perfomiance assessment (Stiggins 1984) provides a sU'ucture 
for teachers to evaluate student behavior and/or products. Assess- 
ments can take any form, dependmg on the behavior or product of 
interest, and are designed according to four considerations: (1) a 
decision situation that defines the basic reason for conducting the 
assessment; (2) a test activity or exercise to which the student 
responds; (3) the student response; and (4) a rating or judgment of 
performance. 

Student Portfolios 

A method which can combine both infomial and formal meas- 
ures is portfolio assessment (e.g.. Wolf 1989). This method is rap- 
idly gaining in popularity because of its ability to assess student 
work samples over the course of a school year or even longer. For 
this reason a more detailed description of portfolios follows. 

Portfolios provide an approach to organizing and summarizing 
student data for programs interested in student- and teacher-oriented 
assessments. They represent a philosophy thrr views assessment as 
an integral component of instmction and the process of learning. 
Using a wide variety of learning indicators gathered across multiple 
educational situations over a specified period of time, portfolios can 
provide an ecologically valid approach to assessing limited English 
proficient students. While the approach is not new, portfolios are 
useful in both formative and summative evaluations, which actively 
involve teachers and students in assessment. 



15 



Portfolios are files or folders containing a variety of information 
that documents a student's experiences and accomplishments. The 
type of information collected for a portfolio can consist of summary 
descriptions of accotnplishments, official records, and diary or jour- 
nal items. Summary descriptions of accomplishments can include 
samples of the student's writing; artwork or other types of creations 
by the students; and testimonies from others (e.g., teachers, stu- 
dents, tutors) about the student's work. 

Formal records typically included in a portfolio are scores on 
standardized achievement and language proficiency tests; lists of 
memberships and participation in extracurricular clubs or events; 
lists of awards and recognitions; and letters of recommendation. 

Diaries or journals can be incorporated in portfolios to help stu- 
dents reflect on their learning. Excerpts from a diary or journal are 
selected for the portfolio to illustrate the students' view of their aca- 
demic and emotional development. 

Valencia (1990) recommends organizing the content of the port- 
folios into two sections. I' ihe first section, the actual work of the 
students, or "raw data," is included. The infomiation in this section 
assists the teacher to examine students' ongoing work, give feed- 
back on their progress, and provide supporting documentation in 
building an in-depth picture of the student's ability. The second sec- 
tion consists of sunmiary sheets or organizational frameworks for 
synthesizing the student's work. The infomiation summarized in the 
second section is used to help teachers look systematically across 
students, to make instructional decisions, and for reporting 
purposes. 

One major concern in using portfolios is with summarizip^, 
infomiation within and across classrooms in a consistent and relia- 
ble manner, an issue discussed below. 

Guidelines for Using Portfolios 
in Bilingual Education Evaluations 

As part of the bilingual education evaluation, the ponfolios can 
be quite useful. They can: 

• be used to meet many of the bilingual education evaluation 
requirements; 

• involve both fomial and informal assessment methods; 

• offer a comprehensive view of students' academic achievement 
and linguistic proficiency; 



13 



ERLC 



If; 



• provide more detailed information on those aspects of students' 
performance which are not readily measured by traditional exam- 
ining methods; 

• reflect the taught curriculum and individual child's learning 
experiences; 

• encourage teachers to use different ways to evaluate learning; 

• document the student's learning and progress; and 

• help teachers examine their own development and skills. 

Although the shape and form of portfolios may change from 
program to program, the real value of a portfolio lies in three areas. 
In the first area, portfolios have the potential to provide project 
teachers and students with a rich source of information to under- 
stand the development and progress of project students and to plan 
educational programs that enhance student learning and "showcase" 
their achievements. In the second area, portfolios allow for report- 
ing in a holistic and valid way. The infomiation gathered in a port- 
folio is taken from actual student work and assessment focuses on 
the whole of what a student learns, not on discrete and isolated facts 
and figures. In the third area, formal and infonnal data can be used 
in a nonadversarial effort to evaluate student learning in a compre- 
hensive and authentic manner. 

Although portfolio assessment offers great flexibility and a 
holistic picture of students' development, several technical issues 
must be addressed to make portfolios valid for bilingual education 
evaluations. These issues ai"e summarized in three organizational 
guidelines which are based on current research and instructional 
practices in education (Au, Scheu, Kawakami, & Herman 1990; 
Jongsma 1989; Pikulski 1989; Simmons 1990; Stiggins 1984; 
Valencia 1990; Wolf 1989). 

1. Portfolios Must Have a Clear Purpose 

To be useful, information gathered for portfolios must reficct 
the priorities of the program. It must be kept in mind that the pur- 
pose of a bilingual education program evaluation stems from the 
goals of the actual program. The first critical step, then, is to iden- 
tify and prioritize the key program goals of curriculum and insmic- 
tion. In developing goals for portfolio assessment, it will be helpful 
to review (a) the state's current language arts and bilingual curricu- 
lum guidelines, (b) the district's or state's standardized achievement 
and language proficiency tests, and (c) the scope and sequence 




I 



charts of the reading and literacy tnaterials that will be used with 
the students. 

Note that the goals of a program should be broad and general, 
not overly specific, concrete, or isolated lesson objectives. For 
example, a goal may be written as "To leiuii reading comprehension 
skills," or "To wriie fluently in English." If goals are too specific, 
portfolios can get cluttered with information that may not be useful 
to the student, teacher, administrator, or evaluator. 

2. Portfolios Must Interact With the Curriculum 

This issue also is known as content validity. It is important that 
the infomiation in portfolios accurately and authentically represent 
the content and instruction of the program. Content validity can be 
maximized by making sure portfolios contain (a) a clear purpose of 
the assessment, (b) a close link between the behaviors or products 
ct^Uected and the evaluation goals, (c) a wide variety of classroom 
exercises or tasks measuring the same skill, and (d) a cross-check of 
student capabilities based on both fomial tests and infonnal 
assessments. 

When deciding on the type of assessment infomiation to include 
in the portfolio, existing instructional activities should be used. 
Most likely, the infoimation will be appropriate for portfolios. For 
example, one of the goals in the Kamehameha Elementary Educa- 
tion Program (KEEP) in Ha\\aii is to increase students' interest in 
reading and expand their repertoire of book reading. To determine 
to what extent this goal is achieved, teachers use a checklist to 
examine students' reading logs. The logs include a list of the titles 
and authors of the books students have read. With this infomiation, 
teachers review each student's list in terms of level of appropriate- 
ness, genres read, and book preferences. Students also are asked to 
include dates the books were read in order to detemiine the number 
of books read over specified periods of time. The information thus 
obtained is then summarized in the checklist and used to monitor 
and report on students' learning as well as to improve instruction. 

3. Portfoh'os Must Be Assessed Reliably 

Reliability in portfolios ma> be defined as the level of consis- 
tency or stability of the devices used to assess student progress. At 
present, there are no set guidelines for establishing reliability for 
portfolios. The major reason is that p- tfclios, by their nature, are 
composed of a broad and varied collection of students' work from 



15 



ERIC 



18 



oral reading, comprehension checks, and teachers' observation 
notes to formal tests of the students' achievement or proficiency. 
Equally important, large-scale portfolio assessment has only 
recently been investigated as an alternative device in educational 
evaluation and research (Brandt 1988; Burnham 1986; Elbov^ & 
Belanoff 1986; Simmons 1990; Wolf 1989). 

However, there are several criteria which are recommended in 
estimating the reliability of portfolios for large-scale assessment. 
These criteria apply both at the classroom level and at the grade 
level. Teachers and administrators must, at a minimum, be able to 

• design clear scoring criteria in order to maximize the raters' 
understanding of the categories to be evaluated; 

• maintain objectivity in assessing student work by periodically 
checking the consistency of ratings given to students' work in the 
same area; 

• ensure inter-rater reliability when more than one person is 
involved in the scoring process: 

• make reliable and systematic observations, plan clear observation 
guidelines; 

• use objective terminology when describing student behavior; 

• allow time to test the observation instrument and its ability to pick 
up the information desired; 

• check for inter-rater reliability as appropriate; 

• keep consistent and continuous records of the students to measure 
their development and learning outcomes; and 

• check judgments using multiple measures such as other tests and 
infomiation sources. 

A major issue that arises in the use of portfolios relates to the 
problem of summarizing data within and across classrooms in a 
consistent and reliable manner. Using the guidelines suggested 
above in the planning and organization of portfolios will provide for 
reliable and valid assessment. These guidelines, however, are only a 
framework for the assessment procedures and will need to be 
applied by teachers to determine their effectiveness and practicality. 



I'j 



Title VII of the Elementary and Secondary Education Act provides 
funding to school districts for implementing bilingual education 
programs to help limited English proficient students learn English. 
There is a requirement that each program receiving funding under 
Title VII submit yearly program evaluation results. 

Title VII regulations focus on summative evaluation, the judg- 
ment of the effectiveness of a program. Formative evaluatipn, 
which provides feedback during a program so that the progr£(m/may 
be improved, is also a concern. Informal assessment procedures can 
be used for both types of evaluation. 

Informal measures are ideal for formative evaluation, because 
they can be given frequently and lend themselves to nearly immedi- 
ate scoring and interpretation. To the extent that informal measures 
are embedded in the curriculum, they provide formative informa- 
tion as to whether the expected progress is being made. Where 
infonnal measures show that progress has been made, they confirm 
the decision to move students forward in the curriculum. Where 
they show that the expected progress has not been made, they sug- 
gest modification of the current approach or perhaps may call for a 
different instructional approach. 

Informal measures also may be particularly appropriate for 
diagnostic assessment of individual students. As mentioned above, 
formal standardized tests may not necessarily focus on the skills 
that a specific group of limited English proficient students are being 
taught. Informal measures should be drawn directly from the work 
the class is engaged in and thus provide evidence of mastery of 
intended objectives. The teacher can examine each student's work 
for that evidence. 

Informal measures tend to be production or perfonnance meas- 
ures. This means children are tested by actually doing whatever it is 
the teacher hopes they can do. For example, limited English profi- 
cient children often confuse sheAie/it or leave off the "s" on third 
person singular English verbs (e.g., "she run" for "she runs"). An 
infomial nTvasure should demonstrate whether the child can pro- 
duce the distinction between sheAie/it or say "she runs." In contrast, 
most formal tests are indirect measures that ask the child to recog- 
nize a correct form (among several forms, some of which are 
"incorrect"). Recognition and production involve very different 
skills. Recognition of a linguistic distinction does not imply the 
ability to produce that distinction. Thus a formal measure might 
give an erroneous indication of a student's competence. 



Evaluation 
of 

ESEA 

Title VII-Funded 
Programs 



17 



ERIC 



20 




Informal assessments also can be used for summative evaluation 
reports. Ir general, three conditions allow for the use of informal 
assessment in summative evaluation. First, goals must be operation- 
alized as clearly stated performances that can be measured. Second, 
informal measures must be selected and applied consistently and 
accurately in order to match the operationalized goals. And third, 
the measures must be scored in a v^ay that permits the aggregation 
of individual scores into group data that represent performance vis- 
^-vis the stated goals. This means that either the assessments, the 
scoring procedures, or both must have uniformity across the 
students. 



Reporting 

Assessment 

Data 



/ 



Title VII evaluation regulations require that both formal and infor- 
mal assessment data be summarized across students. These regula- 
tions allow for the collection of both qualitative and quantitative 
data. Descriptions of pedagogical materials, methods, and tech- 
niques utilized in the program certainly can be addressed using 
either qualitative or quantitative data. Reporting the academic 
achievement of project participants using valid and reliable meas- 
ures essentially requires a quantitative approach. 

Infomial assessment of student achievement or language profi- 
ciency, when used to supplement standardized achievement test 
data, probably is approached best from a quantitative perspective. 
Quantitative data collected toward this end meets the current Title 
VII evaluation regulations for reporting student achievement and 
proficiency data and has the potential to be aggregated more readily 
across students. Efficiency is important in accumulating data for an 
evaluation. Data can be collected both for purposes of feedback to 
program personnel and for the evaluation reports submitted to the 
Office of Bilingual Education and Minority Languages Affairs 
(OBEMLA). Thus OBEMLA describes types of data to be col- 
lected, but formal versus informal assessment approaches are not 
prescribed. The data required can be summarized into three areas: 
student outcomes, program implementation, and technical stan- 
dards. Program staff and evaluators should refer to the appropriate 
Federal Regulations for specific information.^- 



Student Outcome Data 

In reporting the academic achievement and language profi- 
ciency outcomes of project students, formal and infomial assess- 
ments can be combined to meet the federal evaluation regulations. 



18 




Si 



Information on formal assessment may indicate how well students 
are performing in relationship to other students across the nation, 
state, and/or school district as well as at the school and classroom 
level. In addition, reporting achievement scores by subscale (e.g., 
vocabulai'y, grammai-, comprehension) rather than total scores (e.g, 
reading) provides a finer breakdown and understanding of students' 
strengths and weaknesses and pinpoints areas of improvement. 

Synthesized informal data can be used to support formal test 
findings or to provide documentation of the students' progress in 
instructional areas not covered in a formal test. In addition, informal 
data can provide more specific information about student progress 
through the curriculum and can provide it continuously throughout 
the year. The key to using informal data is that the information per- 
tains to program goals and related objectives. Informal data can 
answer questions such as: What skills or concepts did the student 
actually learn during the academic year? To what extent did stu- 
dents have the opportunity to acquire the particular skills or con- 
cepts? What progress did the students make over the year? How did 
the students' attitudes affect learning? 

Formal or infonnal approaches can be used to address rates of 
change as long as the information on each participating student is 
maintained. The information also must be collected in a continuous 
and accurate manner. 

An additional Title VII evaluation requirement is that project 
student outcomes be compared to those of a nonproject comparison 
group. In addressing this requirement, similar formal and informal 
assessment procedures should be utilized where possible. Flowever, 
if access to a nonproject comparison group is limited, then informa- 
tion for project and nonproject groups should be provided at least 
on academic achievement, language proficiency, and, if available, 
rates of change in attendance, drop-out, and postsecondary enroll- 
ment. This data collection provides a valid comparison of the 
project students' learning outcomes and answers the question, 
"How do project students compare to similar students not receiving 
project support?" 

Program Implementation 

The essential purpose of evaluating the implementation of the 
program is to answer the question, "Does the unique combination of 
activities, instructional practices, materials, and role of the staff in 
the project lead to the achievement of its objectives?" Under Title 



19 



ERIC 



22 



Vll, information is required on program implementation including a 
description of instructional activities, time spent in those activities, 
and background on the staff responsiole for carrying out the 
program. 

One informal technique for collecting information is through 
existing information sources. In portfolios, for example, informa- 
tion can be collected on the students' backgrounds, needs, and com- 
petencies as well as on specific activities completed for children 
who may be handicapped or gifted and talented. Attendance lists 
also can be used in calculating the amount of time students received 
instructional services in the project. Information on the instructional 
T'me, specific educational activities, and instructional strategies can 
be collected and reported from teacher lesson plans or from teacher 
activity logs. The educational and professional data about the staff 
can be found in their job application forms. While this method can 
produce accurate information, a major concern in relying on this 
approach is that data collected may be incomplete or not relevant. 

Another approach to use in collecting the required information 
is through the use of self-reports such as questionnaires and inter- 
views. These methods of data collection can be used in two ways. 
First, information gathered may provide "recollected" or indirect 
versions of how the program was implemented. Since recollected 
data is relatively weak, also include supporting evidence, such as 
observations or existing records, whenever possible. On the other 
hand, these methods also can be used to collect information in an 
ongoing fashion which can result in more reliable data. 

Technical Standards 

For programs to have meaning they must have a standard point 
of reference. A standard is a set of baseline criteria that provides 
principles or rules for determining the quality or value of an evalua- 
tion. Title VII regulations require a description of specific technical 
standards in the annual evaluation report. These standiu-ds include a 
description of the data collection instruments and procedures, test 
administration and scoring, and accuracy of the evaluation proce- 
dures as well as the process for selecting a nonproject comparison 
group. When using either formal or informal assessment, describe 
how: 

1. the nonproject group was selected; 

2. conclusions apply to the persons, schools, or agencies seived by 
the project; 

3. instruments consistently and accurately measure progress toward 
accomplishing the objectives of the project; and 



23 




4. instruments appropriately consider factors such as age, grade, 
language, degree of language fluency and background of the per- 
sons being servca. 

The standards are intended to ensure ihu un evaluation conveys 
sound information about the features of the program. These stan- 
dards require that the program infomiation be technically adequate 
and that conclusions be linked logically to the data. 

The pervasive theme in data collection is that bilingual education ConClUSion 

programs should strive to make their evaluations practical, viable, 

and accurate. By using a combination of both formal and informal 

assessments these requirements can be met effectively. We have not 

proposed that informal assessment be used in place of standardized 

tests; rather that they be used in conjunction with standardized tests. 

While these formal measures provide general year-to-year 
progress of students in global content areas, they cannot provide the 
continuous, ongoing measurement of student growth needed for for- 
mative evaluation and for planning instructional strategies. Informal 
techniques can do so. The challenges faced in using infonnal 
assessment in the evaluation of bilingual education programs are 
the following: 

• First, can informal assessment be held up to the same psycho- 
metric standards applied to formal assessment? With techniques 
such as those suggested above, reliable and valid informal assess- 
ment can be developed. 

• Second, can further procedures be developed for aggregating 
the diverse infonnation provided by informal assessment into a 
meaningful set of indices that allow us to state whether or not our 
programs are effective? 

We believe these challenges can be met within bilingual educa- 
tion by using current understanding of informal assessment as a 
ibundation on which to build. 



1. This document has been produced by staff at tJie Evaluation 
Assistance Center (West) under contract #T2880O30()2 with the 
U.S. Department of Education. 

2. For those wishing to consult these regulations, see the Depart- 
ment of Education 34 CFR Part 5()().50-5()0.52 as published in the 
Federal Register June 19, 1986 and October 5, 1988. 



Endnotes 



ERIC 



24 



21 




Au, K., Scheu, A., Kawakami, A., & Herman, P. (1990). Assess- 
ment and accountability in a whole literacy cuniculum. The 
Reading Teacher, 4, 574-578. 

Brandt, R. (1988). On assessment in the arts: A conversation with 
Howard Gardner. Educational Leadership, 45 (4), 30-34. 

Bulgren, J. A., & Knackendoffel, A. (1986). Ecological assessment: 
An overview. The Pointer, 30 (2), 23-30. 

Bumham, C. (1986). Portfolio evaluation: Room to breathe and 
grow. In Charles Bridges (Ed.), Training the teacher of college 
composition. Urbana, IL: National Council of Teachers of 
English. 

Deno, S. L. (1985). Curriculum-based measurement: The emerging 
alternative. Exceptional Children, 52, 219-232. 

Elbow, P., & Belanoff, P. (1986). Portfolios as a substitute for pro- 
ficiency examinations. College Composition and Communica- 
tion, 37, 336-339. 

Goodman, K. (1973). Analysis of oral reading miscues: Applied 
p.sycholinguistics. In F. Smith (Ed.), Psycholinguistics and 
reading. New York: Holt, Rinehart and Winston, Inc. 

Haney, W., & Madaus, G. (1989). Searching for alternatives to 
standardized tests: Whys, whats, and whithers. Phi Delta 
Kappan, 70, 683-687. 

Jongsma, K. S. (1989). Portfolio assessment. The Reading Teacher, 
43, 264-265. 

Marston, D., & Magnusson, D. (1987). Curriculum-based measure- 
ment: An introduction. Minneapolis, MN: Minneapolis Public 
Schools. 

Muir, S., & Wells, C. (1983). Informal evaluation. Social Studies, 
74 (3), 95-99. 

Neill, D. M., & Medina, N. J. (1989). Standardized testing: Harmful 
to educational health. Phi Delta Kappan, 70, 688-697. 

Pikulski, J. J. (1990). Assessment: The role of tests in a literary 
assessment program. The Reading Teacher, 44, 686-688. 

Pikulski, J. J. (1989). The assessment of reading: A time for 
change? The Reading Teacher, 43, 80-81 . 

Rogers, V. (1989). Assessing the curriculum experienced by chil- 
dren. Phi Delta Kappan, 70, 714-717. 

Shepard, L. A. (1989). Why we need better assessments. Educa- 
tional Leadership, 46 (7), 4-9. 



22 

ERIC 



Simmons, J. (1990). Portfolios as large scale assessment. Language 
Arts, 67, 262-267. 

Stiggins, R. J. (1984). Evaluating students by classroom observa- 
tion: Watching students grow. Washington, DC: National Edu- 
cation Association. 

Stiggins, R. J., & Bridgeford, N. J. (1985). The ecology of class- 
room assessment. Journal of Educational Measurement, 22, 
271-286. 

Tucker, J. A. (1985). Curriculum-based assessment: An introduc- 
tion. Exceptional Children, 52, 199-204. 

Valencia, S. (1990). A ponfolio approach to classroom reading 
assessment: The whys, whats, and hows. The Reading Teacher, 
43, 338-340. 

Wiggins. G. (1989). A true test: Toward more authentic and equita- 
ble assessment. Phi Delta Kappan, 70, 703-713. 

Wolf, D. P. (1989). Portfolio assessment: Sampling student work. 
Educational Leadership, 46 (7), 35-39. 



/ 



I 



The authors are on the staff of the Evaluation Assistance Center 
(West) at the University of New Mexico. 

Cecilia Navarrete, Senior Research Associate, received her Ph.D. 
in Education from Stanford University. 



About 
the 

Authors 



Judith Wilde, Methodologist, received her Ph.D. in the 
Psychological Foundations of Education from the University of 
New Mexico. 



Chris Nelson, Senior Re';earch Associate, received her Ph.D. in 
Educational Psychology and Reseaich from the University of 
Kansas. 



Robert Martinez, Senior Research Associate, received his Ph.D. in 
Educational Research from the University of New Mexico. 



Gary Hargett, Research Associate, is a doctoral candidate in 
Education at the University of Washington. 



ERIC 



GOVERNMKNT PHINTINq; OFPICf:: 1900 273-55 i 



27 



Tjni\'ersiW 



